Introduction to deep learning

Lisa Bonheme and Marek Grzes

University of Kent

COMP6360/8360, Teaching week 12

Last modified 13/11/2022

Content

Installing Tensorflow on Anaconda

During this class, we will need tensorflow 2, which is not installed in miniconda by default. You can install it using the anaconda navigator as follows:

(Images are taken from this tutorial)

You can also install tensorflow and other Python packages using pip. For that, you need to open a terminal window, and assuming that your Python environment is active in that window, you can install tensorflow typing: pip install tensorflow.

Overview of the Moon dataset

The Moon dataset is an artificial dataset with two intertwined moon shapes belonging to two different classes.

The dataset is composed of 10000 data points with the following features:

So, our data matrix is of size (10000, 2); that is (nb_data_points, nb_features).

Now, let us visualise what this dataset looks like.

Question 1 - Comparing logistic regression and deep learning

Now that we have our dataset, we will create a function to plot the decision boundary of our models and use it to compare the behaviour of a logistic regression and a deep learning model. Note that logistic regression is another name for our familiar delta rule when the sigmoid activation function is used.

Plotting decision boundaries

Nothing to do here, but you can explore this function to see what it does if you are curious about it.

Note that this part of the class is inspired by scikit learn example on multinomial logistic regression.

The decision boundary of a logistic regression model

Let us train a logistic regression model and plot its decision boundaries after training.

The decision boundary of a deep neural network

Now, let us create a simple neural network with no hidden layers and train it for a few epochs.

At the end of each epoch, we will plot the decision boundaries using the BoundariesCallback defined below.

Below is the deep model that we will use in the rest of this section.

Now, we have the definition of our deep learning model, and we can compute its decision boundaries during training.

Below, we are going to create and train a very simple neural network with no hidden layers.

Question 2 - The decision boundaries of deep neural networks

Let us repeat the last experiment with a hidden layer added to our neural network.

Impact of the activation function

Using a deep neural network with one hidden layer, let us now investigate the impact of activation functions on our hidden layer and the final predictions.

Here, we use the "ReLU" function, which stands for Rectified Linear Unit. This function transforms the outputs of your hidden layer by keeping only the positive part, so that $f(x) = \max(0, x)$. In this formula, $x$ is the net input and $f(x)$ is the activation.

Impact of the learning rate

The default learning rate is 0.005. Change its value and observe and then analyse the results. You are encouraged to run this test with both linear and ReLU activations in the hidden layer. See the comments in the code below.

Question 3 - Implement the backpropagation algorithm (from scratch)

Now that you have seen how a deep model could be implemented using tensorflow 2, you will create your own implementation of a deep learning algorithm! We ask you to code the backpropagation algorithm that was presented in our lectures. All the equations required for this implementation are in our lecture slides. You will need to transfer them to your Python code below.

The sigmoid function

We have previously used the sigmoid or ReLU activation function in the last layer of our deep model, and we will need it again for this question. If we use sigmod in this section, we can reuse the equations that we have in our lecture slides.

The backpropagation algorithm

Now that our sigmoid function is ready, let us define a skeleton for our custom deep learning algorithm.

The backpropagation algorithm for a feedforward network of two layers of sigmoid units can be defined as follows:

For each $(x, y)$ in the training examples, DO:

This procedure is based on Table 4.2 in Machine learning, Tom Mitchell, McGraw-Hill Education, 1997. Note that the example in this book assumes that the output unit uses sigmoid activation and the SSE error is optimised. In our discussion of the delta rule with sigmoid activation, we assumed the CE error, which made the update equation of the output units slightly simpler. We used this assumption in the pseudocode above.

Using the class MyDeepModel below, do the following:

Question 4 - Tune your model

If you have finished early, and you'd like something more challenging to do, you can add a few options to refine your model. For example: